Using N-grams to Process Hindi Queries with Transliteration Variations

نویسندگان

  • Anand Natrajan
  • Allison L. Powell
  • James C. French
چکیده

Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on Ngrams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliterated Search using Syllabification Approach

Machine transliteration refers to the process of automatic conversion of a word from one language to another without losing its phonological characteristics. In this work, we present our experiments performed in subtask-1 and subtask-2 as a part of the FIRE-2013 transliterated search task. In both the subtasks, the transliteration from Roman script to Devanagari script was performed using sylla...

متن کامل

Urdu Hindi Machine Transliteration using SMT

Transliteration is a process of transcribing a word of the source language into the target language such that when the native speaker of the target language pronounces it, it sounds as the native pronunciation of the source word. Statistical techniques have brought significant advances and have made real progress in various fields of Natural Language Processing (NLP). In this paper, we have ana...

متن کامل

Hindi to Punjabi Transliteration using Phonetic and Orthographic Rules

One of the important applications of Natural Language Processing is machine translation. Machine transliteration is an emerging and a very important research area in the field of machine translation. Translation systems translate message from source language to target language, keeping the exact meaning. While the transliteration system finds the same meaning word/sentence in another language, ...

متن کامل

Hybrid Approach for Hindi to English Transliteration System for Proper Nouns

s Abstract— In this paper hybrid approach is presented to transliterate proper nouns written in Hindi language into its equivalent English language. Hybrid approach means combination of direct mapping, rule based approach and statistical machine translation approach. Transliteration is a process to generate the words from the source language to the target language. The reverse process is known ...

متن کامل

IIIT Hyderabad’s CLIR experiments for FIRE-2008

This paper discourses our CLIR experiments performed for the FIRE workshop. We had submitted our runs for Adhoc monolingual document retrieval in Hindi and English, and Ad-hoc cross-lingual document retrieval from Hindi to English, and English to Hindi. In this paper, we describe our English to Hindi and Hindi to English CLIR systems and the experiments conducted on them using the FIRE2008 data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997